Approximating the Longest Increasing Sequence and Distance from Sortedness in a Data Stream
نویسندگان
چکیده
We revisit the well-studied problem of estimating the sortedness of a data stream. We study the complementary problems of estimating the edit distance from sortedness (Ulam distance) and estimating the length of the longest increasing sequence (LIS). We present the first sub-linear space algorithms for these problems in the data stream model. • We give a O(log n) space, one-pass randomized algorithm that gives a (4 + ) approximation to the Ulam distance. • We O( √ n) space deterministic (1 + ) one-pass approximation algorithms for estimating the the length of the LIS and the Ulam distance. • We show a tight lower bound of Ω(n) on the space required by any randomized algorithm to compute these quantities exactly. This improves an Ω( √ n) lower bound due to Vee et al. and shows that approximation is essential to get space-efficient algorithms. • We conjecture a space lower bound of Ω( √ n) on any deterministic algorithm approximating the LIS. We are able to show such a bound for a restricted class of algorithms, which nevertheless captures all the algorithms described above. Our algorithms and lower bounds use techniques from communication complexity and property testing. ∗Work done in part while the author was at IBM Almaden
منابع مشابه
Lecture 15 : Sortedness , Connectivity , MST Weight , Components
which consists of an increasing sequence of m decreasing sequences, each of length n/m. The longest increasing subsequence has length m but but the only way for the sample to imply that the input is not sorted is if two chosen elements land in the same decreasing sequence. The probability that this happens is at most ( s 2 ) /m ≤ s/(2m) which is o(1) if s is o( √ m). In particular, for ε = 1/2 ...
متن کاملOn Differentially Private Longest Increasing Subsequence Computation in Data Stream
Many important applications require a continuous computation of statistics over data streams. Activities monitoring, surveillance and fraud detections are some settings where it is crucial for the monitoring applications to protect user’s sensitive information in addition to efficiently compute the required statistics. In the last two decades, a broad range of techniques for time-series and str...
متن کاملMeasuring the Similarity of Trajectories Using Fuzzy Theory
In recent years, with the advancement of positioning systems, access to a large amount of movement data is provided. Among the methods of discovering knowledge from this type of data is to measure the similarity of trajectories resulting from the movement of objects. Similarity measurement has also been used in other data mining methods such as classification and clustering and is currently, an...
متن کاملApplication of Data-Mining Algorithms in the Sensitivity Analysis and Zoning of Areas Prone to Gully Erosion in the Indicator Watersheds of Khorasan Razavi Province
Extended abstract 1- Introduction Gully erosion is one of the most important sources of sediment in the watersheds and a common phenomenon in semi-arid climate that affects vast areas with different morphological, soil and climatic conditions. This type of erosion is very dangerous due to the transfer of fertile soil horizons, and the reduction of water holding capacity also is a factor for s...
متن کاملSpace efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any δ > 0, given streaming access to an array ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006